Similarity Component Analysis

نویسندگان

  • Soravit Changpinyo
  • Kuan Liu
  • Fei Sha
چکیده

Measuring similarity is crucial to many learning tasks. To this end, metric learning has been the dominant paradigm. However, similarity is a richer and broader notion than what metrics entail. For example, similarity can arise from the process of aggregating the decisions of multiple latent components, where each latent component compares data in its own way by focusing on a different subset of features. In this paper, we propose Similarity Component Analysis (SCA), a probabilistic graphical model that discovers those latent components from data. In SCA, a latent component generates a local similarity value, computed with its own metric, independently of other components. The final similarity measure is then obtained by combining the local similarity values with a (noisy-)OR gate. We derive an EM-based algorithm for fitting the model parameters with similarity-annotated data from pairwise comparisons. We validate the SCA model on synthetic datasets where SCA discovers the ground-truth about the latent components. We also apply SCA to a multiway classification task and a link prediction task. For both tasks, SCA attains significantly better prediction accuracies than competing methods. Moreover, we show how SCA can be instrumental in exploratory analysis of data, where we gain insights about the data by examining patterns hidden in its latent components’ local similarity values.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the quality of images synthesized by discrete cosines transform – regression based method using principle component analysis

  Purpose: Different views of an individuals’ image may be required for proper face recognition.   Recently, discrete cosines transform (DCT) based method has been used to synthesize virtual   views of an image using only one frontal image. In this work the performance of two different   algorithms was examined to produce virtual views of one frontal image.   Materials and Methods: Two new meth...

متن کامل

Detection of Fake Accounts in Social Networks Based on One Class Classification

Detection of fake accounts on social networks is a challenging process. The previous methods in identification of fake accounts have not considered the strength of the users’ communications, hence reducing their efficiency. In this work, we are going to present a detection method based on the users’ similarities considering the network communications of the users. In the first step, similarity ...

متن کامل

A New Nonlinear Fuzzy Robust PCA Algorithm and Similarity Classifier in Classification of Medical Data Sets

In this article a classification method is proposed where data is first preprocessed using new nonlinear fuzzy robust principal component analysis (NFRPCA) algorithm to get data into more feasible form. After this preprocessing step the similarity classifier is then used for the actual classification. The procedure was tested for dermatology, hepatitis and liver-disorder data. Results were quit...

متن کامل

Monitoring and assessment of a eutrophicated coastal lake using multivariate approaches

Multivariate statistical techniques such as cluster analysis, multidimensional scaling and principal component analysis were applied to evaluate the temporal and spatial variations in water quality data set generated for two years (2008-2010) from six monitoring stations of Veli-Akkulam Lake and compared with a regional reference lake Vellayani of south India. Seasonal variations of 14 differen...

متن کامل

Clustering of Multivariate Time-Series Data

A new methodology for clustering multivariate time-series data is proposed. The methodology is based on calculation of the degree of similarity between multivariate time-series datasets using two similarity factors. One similarity factor is based on principal component analysis and the angles between the principal component subspaces while the other is based on the Mahalanobis distance between ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013